Darwins Frog Monomorphic Call Structure and Dimorphic Vocal Phenology

# require = will install and load, library can only load installed
# packages
require(tidyverse)
require(ggplot2)  # pretty plots
require(dplyr)  # dataframe manipulations and  %>% 
require(infer)  #  statistics
require(tidyr)  # nice tables that look nice
require(car)  # ANOVA, also used by the authors for ANOVA
require(mosaic)  # histogram, instead of hist (contains {lattice})
require(formatR)  #to format R
require(knitr)  # inserting images
require(kableExtra)  # modifying kable tables
require(skimr)  # for data overview, data visualization 

introduction:

[Include a brief summary of the paper you are reanalyzing data from (e.g., the overall objective of the paper, the types of data collected and how sampling was done, what the main results were) and lay out what you intend to replicate.]

background

The southern Darwin’s frog (Rhinoderma darwinii) mouth-brooding frog species endemic to Chilean Patagonia. R. darwinii has a fascinating method for reproduction: the female (egg-producer) deposits eggs and the male (sperm-producer) fertilizes them, then guards them until the embryos are visibly wriggling inside. Then the male transfers those eggs to their vocal sac where they will be safe from predators and receive nutrients from the male (similar to marsupial embryonic development in their pouch). Ultimately, the male broods these fertilized eggs, often from multiple clutches with different partners, in their vocal sacs for about 50 days, until fully developed froglets emerge by pushing their way out the male’s mouth! All a while during this gestation period, the male continues to call for mates (Sandmeier, 2016)! There are 3 sex roles in this species: pregnant male (MP), non-pregnant male (M or NPM), and female (H or F), and the main sexually dimorphic characteristic is size, where the egg-producers (F) are slightly larger. In this nearly sexually monomorphic (i.e., the sexes look physically identical) species, mate selection depends on advertising calls, and R. darwinii is one of the few species where all sexes (PM, NPM, and F) use advertising calls to attract the attention of a potential mate (Serrano et al., 2020).

hypothesis

To investigate the factors modulating monomorphic vs. dimorphic sexual signals in the R. darwinii, Serrano et al. (2020) hypothesized species with temporally variable periods of intrasexual competition will have monomorphic instead of dimorphic sexual signals to ensure conspecific recognition. In R. darwinii, they specifically focus on timing/availability of advertising calls for mate attraction used by calling (pregnant and non-pregnant) males and calling females (i.e., vocal phenology) (Serrano et al., 2020).

methods

They investigated this with a population of southern R. darwinii on Chiolé Island in Southern Chile during mating season (October 2015-February 2016). Of the 156 frogs caught in the field (118 adults), they recorded calls from 32 of them (Plot1) In the field, they first recorded individual calls (tracks) then collected population monitoring data (snout-vent length, SVL (mm); weight (g)). They also collected data on the sex and sexual status (MP = pregnant male, H = female, M = non-pregnant male) of each frog caught using body size and morphological characteristics. For the pregnant males, they counted externally the number of larvae in the vocal sacs. For each call recordings, they measured the call repetition rate (CRR, number of calls made in a 5 min period after the first call produced), the sound pressure level (SPL, dB), call duration (CD, seconds), the number of notes per call (NC), the note duration (ND, ms), the dominant frequency of the call (DF, Hz); and the amplitude of each vocalization (root mean square (RMS) amplitude) (Serrano et al., 2020).
The Serrano et al. (2020) used Simple Pearson correlations to explore the association used to explore association between acoustic variables of the calls (CRR, SPL, CD, NC, ND, DF) with physical characteristic variables (SVL, weight) (data not shown in article). They used Spearman correlations to determine the association between acoustic properties of the calls of pregnant males and the number of tadpoles in their vocal sacs (data not shown in article). The authors used an ANOVA to investigate the differences in acoustic variables depended on the relative body size of the sexes (Table 1), then they conducted a discriminant function analysis (DFA) for the note duration, dominant frequency, sound pressure level, and aggregated entropy of 4-notes calls (the most common call structure), then analyzed using two linear discriminant vectors (LD) (Figure 4) (Serrano et al., 2020).

results

Ultimately, Serrano et al. (2020) found females, who have larger body size (SVL and weight) than both male sexes, had longer note duration and call duration than any males (Table 1, Figure 3), and, even when data was corrected for SVL, no acoustic variable had a significant difference between the sexes, based on their ANOVA (Table 1), suggesting body size explains any variation among the acoustic variables and a monomorphic call structure. They also found a negative relationship between number of tadpoles in the vocal sacs of pregnant males and their call amplitude, indicating the advertising call of pregnant males is attenuated by an increased number of tadpoles AND could be an auditory indicator of tadpoles thus could be used for selecting a mate (data not shown; just very cool! This is something several other students and I studied with the first author of this paper in the field in 2019)! Ultimately, they propose mutual selective pressures of the different sexes might contribute to social recognition and sexual status recognition (i.e., recognition of reporoductive state, active or not) of individuals (Serrano et al., 2020).

data replication

I will be replicating the descriptive statistic analysis of the qqPlots, Simple Pearson Correlation, Simple Spearman correlation to explore normality trends in the data, just as the authors did, and ANOVA presented in Table 1 of Serrano et al. (2020). THen I will be replicating the Inferential statistical analysis of acoustic variables note duration, call duration, and dominant frequency (figure 3). Lastly, I will replicate the linear discrimination analysis of 5 acoustic variables (Dominant Frequency, Call Rate, Note Duration, Aggregated entropy, and Sound Pressure Level) (figure 4).

visualization of data

load in dataset:

f <- "https://raw.githubusercontent.com/slcornett/data-analysis-replication/main/data/Serrano_et_al_2020_MPMH.csv"
d <- read_csv(f, col_names = TRUE, show_col_types = FALSE)  # show column names, hides dataframe message details
d <- d %>%
    mutate(CRR = (Calls5min/5))  # creating calls per minute call repetition rate for ease in future calculations

data summary statistics

# Data summary 
s <- skim(d) # to make a summary table of skim(d) output
# character data
s %>% dplyr::filter(skim_type == "character") %>% 
  dplyr::select(skim_variable, n_missing, character.min, character.max, character.empty, character.n_unique) %>% 
  kable(align = 'l', booktabs = TRUE) %>% 
  kable_styling(font_size = 11, full_width = TRUE)
skim_variable n_missing character.min character.max character.empty character.n_unique
Nombre 0 3 15 0 30
Captura 0 3 4 0 31
Sex 0 7 18 0 3
Sexo 0 1 2 0 3
Track 0 4 4 0 31
# numeric data modify the specific output
s %>% dplyr::filter(skim_type == "numeric") %>%
  dplyr::select(skim_variable, n_missing, numeric.mean, numeric.sd, numeric.p0, numeric.p25, numeric.p50, numeric.p75, numeric.p100, numeric.hist) %>%
  kable(align = 'l', booktabs = TRUE) %>% # left aligned data
  kable_styling(font_size = 11, full_width = TRUE) # font-size
skim_variable n_missing numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
Calls5min 0 7.290323e+00 2.734880e+00 3.000000 5.000000 8.000000e+00 9.000000e+00 1.200000e+01 ▅▅▇▅▅
Temp 22 1.938889e+01 3.077923e+00 15.400000 18.200000 1.900000e+01 2.030000e+01 2.560000e+01 ▅▇▅▂▂
HR 22 7.883333e+01 4.238514e+00 71.500000 75.100000 7.920000e+01 8.210000e+01 8.420000e+01 ▂▃▂▂▇
Larvas 20 3.090909e+00 1.221028e+00 2.000000 2.000000 3.000000e+00 4.000000e+00 5.000000e+00 ▇▃▁▃▃
LHC 1 2.242667e+01 9.351132e-01 20.600000 21.800000 2.235000e+01 2.307500e+01 2.520000e+01 ▂▇▅▂▁
Peso 1 9.356667e-01 1.389042e-01 0.680000 0.837500 9.200000e-01 1.045000e+00 1.190000e+00 ▇▆▇▆▇
SPL 6 6.336860e+01 4.117350e+00 53.418182 61.700000 6.322143e+01 6.631667e+01 7.143103e+01 ▂▃▇▇▃
NC 0 1.474194e+01 7.763022e+00 4.000000 8.000000 1.400000e+01 1.800000e+01 3.300000e+01 ▆▇▃▃▂
ND 0 1.684006e-01 3.099900e-02 0.115131 0.142706 1.624118e-01 1.890287e-01 2.565455e-01 ▅▇▃▃▁
CD 3 1.670783e+00 2.164759e-01 1.311454 1.510821 1.636583e+00 1.838234e+00 2.141000e+00 ▅▇▅▃▃
Agg Entropy 0 2.106072e+00 5.376220e-01 1.379379 1.666791 1.942529e+00 2.477863e+00 3.161019e+00 ▇▆▃▂▃
Avg Entropy 0 1.970192e+00 3.396755e-01 1.524621 1.723484 1.851720e+00 2.123599e+00 2.721850e+00 ▇▇▃▃▂
DF 0 3.597235e+03 2.901571e+02 2995.193548 3389.708648 3.617600e+03 3.793418e+03 4.272157e+03 ▂▆▇▅▂
RMS Amp 0 1.005410e+05 7.949927e+04 12731.900000 53886.150000 6.422570e+04 1.228883e+05 3.560384e+05 ▇▃▁▁▁
CRR 0 1.458065e+00 5.469760e-01 0.600000 1.000000 1.600000e+00 1.800000e+00 2.400000e+00 ▅▅▇▅▅
detach(package:skimr)

Abbreviation definitions:

DESCRIPTIONS/IDENTIFIERS:
• Nombre = name given to ID the frog;
• Captura = number captured/image name;
• Track = the call recording track
PHYSICAL MEASUREMENTS:
• sexo = sexual status (MP = pregnant male, H = female, M = non-pregnant male)
• peso = mass (g);
• LHC = longitud hocico a cola (snout-vent length, mm);
• temp = temperature (ºC);
• HR = relative humidity;
• larvas = number of tadpoles/offspring in their parent’s mouth!
ACOUSTIC VARIABLES:
• Calls5min = calls in 5 min interval;
• SPL = Sound Pressure Level (dB);
• NC = number of notes/call;
• ND = note duration (either s or ms);
• CD = call duration (s);
• Agg Entropy = aggregate entropy;
• Avg Entropy = average entropy;
• DF = dominant frequency of call (Hz);
• RMS Amp = Root Mean Square Amplitude
• CRR = Call Repetition Rate (Calls/min, i.e., Calls5min/5)

exploratory statistics and comparisons of data:

require(cowplot) # for grid plotting
# not included in article as a graph but i think it's nice to show for reference to the methods
p1 <- ggplot(data = d,  # object created
             aes(x = Sex, # sex of individuals in the dataset
                 y = LHC)) + # y =  SVL, one value dropped b/c NA  
  geom_boxplot(na.rm = TRUE, show.legend = FALSE, aes(color = Sex)) +
  labs(x = "Sex of Frog at time of Capture", y = "Snout-Vent Length (mm)") +
  ggtitle("Body Size of R. Darwinii Frogs Incluced in this Study") +
  theme_classic() + # don't need the legend because x is categorical
  theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
        axis.title.x = element_text(size = 8, color = "dark green"),
        axis.title.y = element_text(size = 8, color = "dark green"))
# CRR by Sex
p2 <- ggplot(data = d, aes(x = Sex, y = CRR, color = Sex)) + 
  geom_boxplot(na.rm = TRUE, show.legend = FALSE) + # don't need the legend because x is categorical
  labs(x = "Sex", y = "Calls/min") +
  ggtitle("Call Repetition Rate by Sex") +
  theme_classic() +
  theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
        axis.title.x = element_text(size = 8, color = "dark green"),
        axis.title.y = element_text(size = 8, color = "dark green")) 
# Notes per Call (NC) vs Sex
p3 <- ggplot(data = d, aes(x = Sex, y = NC, color = Sex)) + 
  geom_boxplot(na.rm = TRUE, show.legend = FALSE) + # don't need the legend because x is categorical
  labs(x = "Sex", y = "Notes per Call") +
  ggtitle("Notes per Call by Sex") + 
  theme_classic() +
  theme(plot.title = element_text(size = 10, color = "dark green",face = "bold"),
        axis.title.x = element_text(size = 8, color = "dark green"),
        axis.title.y = element_text(size = 8, color = "dark green"))
# Call Repetition Rate (CRR) histogram
p4 <- ggplot(data = d, aes(x = CRR, color = Sex)) +
  geom_histogram(na.rm = TRUE, show.legend = FALSE) + facet_wrap( ~ Sex) + 
  labs(x = "Calls per Minute", y = "Frequency of Call Rate") +
  ggtitle("Histogram of Call Repetition Rate") + 
  theme_classic() +
  theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
        axis.title.x = element_text(size = 8, color = "dark green"),
        axis.title.y = element_text(size = 8, color = "dark green"))
# pregnant males : number of tadpoles vs call amplitude
p5 <- ggplot(data = d, aes(x = as.factor(Larvas), y = `RMS Amp`, color = Larvas)) + 
  geom_boxplot(na.rm = TRUE, show.legend = FALSE) + # filtering out all but pregnant males because other sexes don't have tadpoles.
  xlab("Number of Tadpoles") +
  ylab("Call Amplitude (dB)") +
  ggtitle("Pregnant Males: Number of Tadpoles vs Call Amplitude") +
  theme_classic() +
  theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
        axis.title.x = element_text(size = 8, color = "dark green"),
        axis.title.y = element_text(size = 8, color = "dark green"))
# plotting the graphs together
plot_grid(p1, p2, p3, p4, p5, label_size = 8, nrow = 3)

Statistical Replications/Reanalysis

Be sure to thoroughly explain what replications you are doing and comment your code so that it is easy for a reader to understand.
Include in this section relevant tables/figures/values from the original paper for comparison to what you accomplished with your replication.
Note that I want you to do the bulk of any exposition using text and markdown syntax outside of code blocks. That is, your document should not just be one big code block with R style comments but rather a nicely formatted report with code separated from exposition, interpretation, and discussion.]

# making a dataframe of the relevant variables so don't change the original dataset. 
df<- d %>% dplyr::select(Nombre, Sex, Larvas, LHC, Peso, Calls5min, CRR, NC, ND, CD, DF, SPL, `RMS Amp`, `Agg Entropy`, `Avg Entropy`)
# renaming Sex variables in dataframe so will print nicer in the Tukey post-hoc. yes I could just use Sexo but I don't have to change all the code below if i do it like this. I didn't realize the labels were going to be a problem until after I wrote all the code for replication. Also now my visualization labels will match those used in the paper.  
df <- df %>%
    mutate(Sex = case_when(Sex == "Females" ~ "F", 
                           Sex == "Non-pregnant males" ~ "NPM", 
                           Sex == "Pregnant males" ~ "PM"))
print(df) %>% 
  kable(align = 'l', booktabs = TRUE) %>% # left aligned data
  kable_styling(font_size = 10, full_width = TRUE) # font-size# checking work
## # A tibble: 31 × 15
##    Nombre Sex   Larvas   LHC  Peso Calls5min   CRR    NC    ND    CD    DF   SPL
##    <chr>  <chr>  <dbl> <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 "Euge… F         NA  23.4  1.15         8   1.6    15 0.257  2.14 3341.  71.4
##  2 "Mich… NPM       NA  22.4 NA            4   0.8    17 0.156  1.56 3402.  63.2
##  3 "Dr\x… F         NA  23.7  1.1          5   1      12 0.196  2.01 3686.  NA  
##  4 "Herl… NPM       NA  22.7  0.92         5   1      33 0.182  1.72 3488.  62.8
##  5 "Luis… F         NA  23.2  1.06        11   2.2    14 0.157 NA    3682.  64.2
##  6 "Fema… PM         5  22.6  1.11         3   0.6    14 0.176  1.74 3634.  62.4
##  7 "Obam… PM         2  22.4  0.93         9   1.8    22 0.144  1.54 3230.  68.3
##  8 "Edga… PM         3  22.2  0.92         8   1.6    17 0.133  1.42 4272.  NA  
##  9 "Olli… PM         2  22.2  0.86         8   1.6    14 0.209  1.92 3818.  NA  
## 10 "Rube… PM         2  21.6  0.86         3   0.6     4 0.162  1.64 3273   67.6
## # … with 21 more rows, and 3 more variables: `RMS Amp` <dbl>,
## #   `Agg Entropy` <dbl>, `Avg Entropy` <dbl>
Nombre Sex Larvas LHC Peso Calls5min CRR NC ND CD DF SPL RMS Amp Agg Entropy Avg Entropy
Eugenia F NA 23.4 1.15 8 1.6 15 0.2565455 2.141000 3340.876 71.43103 162764.3 1.379379 1.524621
Michelle NPM NA 22.4 NA 4 0.8 17 0.1556250 1.557000 3402.231 63.22143 89155.9 2.427146 2.031938
Dr<e1>cula F NA 23.7 1.10 5 1.0 12 0.1964600 2.008200 3686.474 NA 53409.7 1.959700 1.851720
Herlinda NPM NA 22.7 0.92 5 1.0 33 0.1815974 1.723222 3487.814 62.77895 110800.1 2.882636 2.690169
Luisa F NA 23.2 1.06 11 2.2 14 0.1571250 NA 3682.167 64.16923 43954.0 1.926542 1.766208
Femalesrancisco PM 5 22.6 1.11 3 0.6 14 0.1764444 1.737125 3633.539 62.37500 62616.0 2.544982 2.458500
Obama PM 2 22.4 0.93 9 1.8 22 0.1435536 1.542750 3229.946 68.26000 219744.9 1.852411 1.757268
Edgar PM 3 22.2 0.92 8 1.6 17 0.1327429 1.419000 4272.157 NA 58691.5 1.734871 1.767586
Ollin PM 2 22.2 0.86 8 1.6 14 0.2088393 1.920333 3817.511 NA 105319.3 3.143518 2.380518
Ruben PM 2 21.6 0.86 3 0.6 4 0.1624118 1.643667 3273.000 67.56667 29010.9 1.593118 1.594235
Tehuacan NPM NA 21.8 0.83 11 2.2 8 0.2013824 1.629500 3951.965 58.16000 40606.4 1.942529 1.919500
Esteban NPM NA 20.6 0.78 9 1.8 10 0.1418584 1.311454 3840.141 67.59000 356038.4 2.610080 2.721850
Femaleslash NPM NA 21.8 0.76 11 2.2 19 0.1336744 1.627500 3377.186 NA 64225.7 1.739302 1.684233
267 PM 5 23.4 1.09 12 2.4 15 0.1678679 1.543800 4059.596 56.92857 73951.4 3.161019 2.587359
Cirilo PM 3 22.6 0.94 4 0.8 5 0.1598000 1.688200 3445.300 NA 54693.2 1.606550 1.788800
Abel PM 2 21.7 1.00 7 1.4 17 0.1519701 1.549688 3617.600 69.14000 140005.6 1.561761 1.645627
Izcoatl PM 2 22.3 0.90 4 0.8 7 0.1414857 NA 3652.040 64.54286 106579.7 1.791514 1.915029
Ismael NPM NA NA 0.74 8 1.6 11 0.1312581 1.386143 3923.200 NA 49986.7 1.601355 1.717290
Victoria F NA 23.6 1.11 8 1.6 8 0.1626774 1.518571 2995.194 58.87500 12731.9 2.528581 2.046064
Victor NPM NA 23.2 0.86 5 1.0 13 0.1403438 1.403000 3962.100 62.73750 55335.5 1.552844 1.688219
Melchor NPM NA 21.6 0.75 5 1.0 27 0.1409467 1.487571 3929.941 64.51000 134976.5 1.685573 1.697280
Femalesrancisco NPM NA 22.7 0.80 6 1.2 8 0.1151310 NA 3769.326 61.84667 189382.9 2.258976 1.729679
Xavier NPM NA 21.3 0.68 10 2.0 19 0.1580547 1.478333 3649.885 59.92400 311815.6 1.474633 1.605016
Ernesto NPM NA 21.8 0.78 6 1.2 15 0.1791346 1.850800 3538.088 53.41818 54669.0 1.980154 2.021135
350 F NA 21.9 0.90 7 1.4 8 0.1615600 2.005500 3341.940 64.53750 51199.9 2.132840 1.987320
352 F NA 23.2 1.13 12 2.4 12 0.2103800 1.862300 3593.478 61.70000 86866.9 2.416900 2.276980
Ele F NA 22.1 0.92 10 2.0 33 0.2107479 1.707458 3254.204 66.65769 72817.7 2.219202 2.009328
Lucrecia F NA 22.7 1.00 9 1.8 22 0.1979552 1.981750 3113.609 65.52143 42238.9 3.146298 2.457508
Socorro F NA 25.2 1.19 7 1.4 7 0.1683000 1.761000 3508.483 62.15000 54362.6 2.919900 2.201133
ChiFemales<fa> PM 4 20.9 1.00 8 1.6 27 0.2086509 1.834045 3547.705 59.85652 166881.2 1.648009 1.775726
Silvio PM 4 22.0 1.00 3 0.6 4 0.1658947 1.463000 3617.600 66.31667 61938.0 1.865895 1.778105
# dividing df by sex so easier to access variables. Commented  out the ones i don't actually need in the replication. 
# np.males <- df %>% dplyr::filter(Sex == "Non-pregnant males") #Non-pregnant males (N = 11) 
# head(np.males) 
p.males <- df %>% dplyr::filter(Sex == "PM") # Pregnant males (N = 11)
head(p.males)
## # A tibble: 6 × 15
##   Nombre  Sex   Larvas   LHC  Peso Calls5min   CRR    NC    ND    CD    DF   SPL
##   <chr>   <chr>  <dbl> <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Female… PM         5  22.6  1.11         3   0.6    14 0.176  1.74 3634.  62.4
## 2 Obama   PM         2  22.4  0.93         9   1.8    22 0.144  1.54 3230.  68.3
## 3 Edgar   PM         3  22.2  0.92         8   1.6    17 0.133  1.42 4272.  NA  
## 4 Ollin   PM         2  22.2  0.86         8   1.6    14 0.209  1.92 3818.  NA  
## 5 Ruben   PM         2  21.6  0.86         3   0.6     4 0.162  1.64 3273   67.6
## 6 267     PM         5  23.4  1.09        12   2.4    15 0.168  1.54 4060.  56.9
## # … with 3 more variables: `RMS Amp` <dbl>, `Agg Entropy` <dbl>,
## #   `Avg Entropy` <dbl>
# females <- df %>% dplyr::filter(Sex == "Females") # Females (N = 9, according to the article, N=10, but the dataset only has 9.)
# head(females)

Table 1: Average ± SD of Body Size and Acoustic Features for R. darwinii

[descriptive statistical analysis]

correlation tests

The authors reported performing the following statistical analyses before performing their ANOVA. As mentioned before, they used Simple Pearson Correlations to determine if the different acoustic variables correlate with either Snout-Vent Length (LHC, mm) or mass (g) (Serrano et al., 2020). They also performed Spearman Correlations to determine an association between acoustic variables and the number of tadpoles in the vocal sacs of pregnant males (Serrano et al., 2020). Below, I conduct these tests. Ultimately, I do not find them as informative as the QQ Plots above.

# SHOULD I TAKE THESE OUT? THEY AREN'T VERY INFORMATIVE.
# Pearson correlations between snout-vent length and acoustic variables
# Snout Vent Length (LHC, mm) vs Note Duration (s)
stats::cor.test(df$LHC, df$ND, method = "pearson")  # no correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df$LHC and df$ND
## t = 0.9008, df = 28, p-value = 0.3754
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2048321  0.4979823
## sample estimates:
##       cor 
## 0.1678215
# Snout Vent Length (LHC, mm) vs Call Duration (s)
stats::cor.test(df$LHC, df$CD, method = "pearson")  # no correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df$LHC and df$CD
## t = 1.5882, df = 25, p-value = 0.1248
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08731632  0.61231257
## sample estimates:
##       cor 
## 0.3027431
# Snout Vent Length (LHC, mm) vs Dominant Frequency (Hz)
stats::cor.test(df$LHC, df$DF, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  df$LHC and df$DF
## t = -0.60019, df = 28, p-value = 0.5532
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4545167  0.2580443
## sample estimates:
##        cor 
## -0.1127023
# Snout Vent Length (LHC, mm) vs Aggregate Entropy
stats::cor.test(df$LHC, df$`Agg Entropy`, method = "pearson")  # near correlation (p = 0.058)
## 
##  Pearson's product-moment correlation
## 
## data:  df$LHC and df$`Agg Entropy`
## t = 1.9699, df = 28, p-value = 0.05881
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.01301586  0.62997452
## sample estimates:
##       cor 
## 0.3488894
stats::cor.test(log(df$LHC), log(df$`Agg Entropy`), method = "pearson")  # log transformation does not help.
## 
##  Pearson's product-moment correlation
## 
## data:  log(df$LHC) and log(df$`Agg Entropy`)
## t = 1.875, df = 28, p-value = 0.07127
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.02987812  0.61969111
## sample estimates:
##       cor 
## 0.3339862
# Snout Vent Length (LHC, mm) vs Avg Entropy
stats::cor.test(df$LHC, df$`Avg Entropy`, method = "pearson")  #no correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df$LHC and df$`Avg Entropy`
## t = 0.71073, df = 28, p-value = 0.4831
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2385911  0.4708103
## sample estimates:
##       cor 
## 0.1331208
# Snout Vent Length (LHC, mm) vs Root Mean Squares of call Amplitude
stats::cor.test(df$LHC, df$`RMS Amp`, method = "pearson")  # CORRELATED! (p = 0.01001)
## 
##  Pearson's product-moment correlation
## 
## data:  df$LHC and df$`RMS Amp`
## t = -2.7628, df = 28, p-value = 0.01001
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7054719 -0.1230934
## sample estimates:
##        cor 
## -0.4628373
# Snout Vent Length (LHC, mm) vs Call Repetition Rate (calls/min)
stats::cor.test(df$LHC, df$CRR, method = "pearson")  # super no correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df$LHC and df$CRR
## t = 0.076324, df = 28, p-value = 0.9397
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3476532  0.3727548
## sample estimates:
##        cor 
## 0.01442243
# Pearson correlations between weight and acoustic variables Weight (g) vs
# Note Duration (s)
stats::cor.test(df$Peso, df$ND, method = "pearson")  # CORRELATED! (p = 0.006363)
## 
##  Pearson's product-moment correlation
## 
## data:  df$Peso and df$ND
## t = 2.9494, df = 28, p-value = 0.006363
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1535281 0.7207203
## sample estimates:
##      cor 
## 0.486868
# Weight (g) vs Call Duration (s)
stats::cor.test(df$Peso, df$CD, method = "pearson")  # CORRELATED! (p = 0.0185)
## 
##  Pearson's product-moment correlation
## 
## data:  df$Peso and df$CD
## t = 2.5199, df = 25, p-value = 0.0185
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08448738 0.70883635
## sample estimates:
##       cor 
## 0.4500518
# Weight (g) vs Dominant Frequency (Hz)
stats::cor.test(df$Peso, df$DF, method = "pearson")  # no correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df$Peso and df$DF
## t = -1.4735, df = 28, p-value = 0.1518
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5731400  0.1018496
## sample estimates:
##       cor 
## -0.268263
# Weight (g) vs Aggregate Entropy of call
stats::cor.test(df$Peso, df$`Agg Entropy`, method = "pearson")  # not correlated
## 
##  Pearson's product-moment correlation
## 
## data:  df$Peso and df$`Agg Entropy`
## t = 1.812, df = 28, p-value = 0.08073
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.04110161  0.61272059
## sample estimates:
##       cor 
## 0.3239648
stats::cor.test(log(df$Peso), log(df$`Agg Entropy`), method = "pearson")  # log transformation did not help
## 
##  Pearson's product-moment correlation
## 
## data:  log(df$Peso) and log(df$`Agg Entropy`)
## t = 1.8005, df = 28, p-value = 0.08257
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.0431561  0.6114335
## sample estimates:
##       cor 
## 0.3221214
# Weight (g) vs Average Entropy of call
stats::cor.test(df$Peso, df$`Avg Entropy`, method = "pearson")  # no correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df$Peso and df$`Avg Entropy`
## t = 1.275, df = 28, p-value = 0.2128
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1376277  0.5482556
## sample estimates:
##       cor 
## 0.2342567
# Weight (g) vs Root Mean Squares of call Amplitude
stats::cor.test(df$Peso, df$`RMS Amp`, method = "pearson")  # no correlation, but very close (p = 0.05853)
## 
##  Pearson's product-moment correlation
## 
## data:  df$Peso and df$`RMS Amp`
## t = -1.9723, df = 28, p-value = 0.05853
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.63022222  0.01260513
## sample estimates:
##        cor 
## -0.3492501
# Weight (g) vs Call Repetition Rate (calls/min)
stats::cor.test(df$Peso, df$CRR, method = "pearson")  # no correlation at all
## 
##  Pearson's product-moment correlation
## 
## data:  df$Peso and df$CRR
## t = 0.11242, df = 28, p-value = 0.9113
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3416424  0.3786131
## sample estimates:
##        cor 
## 0.02124125
# Spearman correlations: number of tadpoles in pregnant males' vs acoustic
# variables Number tadpoles vs note duration (s)
stats::cor(p.males$Larvas, p.males$ND, method = "spearman")  # no correlation
## [1] 0.4227058
# Number tadpoles vs call duration (s)
stats::cor(p.males$Larvas, p.males$CD, method = "spearman")  # no correlation
## [1] NA
# Number tadpoles vs dominant frequency (Hz)
stats::cor(p.males$Larvas, p.males$DF, method = "spearman")  # no correlation
## [1] 0.2960874
# Number tadpoles vs Aggregate Entropy of call
stats::cor(p.males$Larvas, p.males$`Agg Entropy`, method = "spearman")  # no correlation
## [1] 0.4419197
# Number tadpoles vs Average Entropy of call
stats::cor(p.males$Larvas, p.males$`Avg Entropy`, method = "spearman")  # no correlation
## [1] 0.5620066
# Number tadpoles vs Aggregate Entropy of call
stats::cor(p.males$Larvas, p.males$`RMS Amp`, method = "spearman")  # no correlation
## [1] -0.201746
# Number tadpoles vs Call Repetition Rate (calls/min)
stats::cor(p.males$Larvas, p.males$CRR, method = "spearman")  # no correlation
## [1] -0.01716697

quantile-quantile plots

Serrano et al. (2020) first verified the normality criteria for all the variables in their data using quantile-quantile plots, likely to determine if any of their variables needed to be transformed before analyzing them with ANOVAs. Though they do not show these plots in their paper, I am replicating them below because the authors do not report which variables were transformed nor do they report how they were transformed, other than stating they used the Box Cox transformation.

require(ggpubr)  # for qqplots
require(cowplot)  # for wrapping the plots
# QQ Plots, will show warnings because there are NAs in the data. this is
# fine.
qp1 <- ggqqplot(df$LHC, title = "Snout-Vent Length (mm) QQ Plot") + font("title",
    size = 10, color = "dark green", face = "bold")  # normal
qp2 <- ggqqplot(df$Peso, title = "Weight (g) QQ Plot") + font("title", size = 10,
    color = "dark green", face = "bold")  # normal
qp3 <- ggqqplot(df$Calls5min, title = "Call Rate Interval (Calls/5min) QQ Plot") +
    font("title", size = 10, color = "dark green", face = "bold")  # normal and looks exactly the same as CRR variable!
qp4 <- ggqqplot(df$CRR, title = "Call Repetition Rate (Calls/min) QQ Plot") +
    font("title", size = 10, color = "dark green", face = "bold")  # derived this from Calls5min, normal 
qp5 <- ggqqplot(df$NC, title = "Notes per Call QQ Polt") + font("title", size = 10,
    color = "dark green", face = "bold")  # normal
qp6 <- ggqqplot(df$ND, title = "Note Duration (s) QQPlot") + font("title", size = 10,
    color = "dark green", face = "bold")  # normal
qp7 <- ggqqplot(df$CD, title = "Call Duration (s) QQ Plot") + font("title",
    size = 10, color = "dark green", face = "bold")  # normal
qp8 <- ggqqplot(df$DF, title = "Dominant Frequency (Hz) QQ Plot") + font("title",
    size = 10, color = "dark green", face = "bold")  # normal
qp9 <- ggqqplot(df$SPL, title = "Sound Pressure Level (dB) QQ Plot") + font("title",
    size = 10, color = "dark green", face = "bold")  # normal
qp10 <- ggqqplot(df$`Agg Entropy`, title = "Aggregate Entropy QQ Plot") + font("title",
    size = 10, color = "dark green", face = "bold")  # not normal
qp11 <- ggqqplot(df$`Avg Entropy`, title = "Average Entropy QQ Plot") + font("title",
    size = 10, color = "dark green", face = "bold")  # not normal
# plotting qq plots all together
plot_grid(qp1, qp2, qp3, qp4, label_size = 8, nrow = 2)

plot_grid(qp5, qp6, qp7, qp8, label_size = 8, nrow = 2)

plot_grid(qp9, qp10, qp11, label_size = 8, nrow = 1)

# detaching unneeded packages
detach(package:ggpubr)
detach(package:cowplot)

Based on these plots, the two variables that appear to not fit the normality criteria for ANOVA are the two acoustic variables of entropy (Average Entropy and Aggregated Entropy). As the authors describe in their Statistical methods, I will transform these data useing a boxcox() from the {MASS} package to determine how to best transform these variables for ANOVA.

Box-Cox transformations for non-normal variables in ANOVA

Here, I run Box-Cox transformation for linear models/ANOVAS to determine how to transform variables not fitting normality criterion based on qqplot. The boxcox() function computes and plots the log-likelihood for an object to show the exponent power (\(\lambda\)) the response variable in the object should be raised to transform the object to fit normality criteria for comparison in ANOVA. Based on the QQ plots above, the variables not meeting normality criteria are Avg Entropy and Agg Entropy, so I ran Box-Cox transformations on them to determine how to transform them for ANOVA. For each variable, I plotted the output to visualize what exponential power I should transform the variables, and added the transformed variables into the working dataframe (df).
Below are the arguments used in this function:
• The object argument is the response variables (Avg Entropy and Agg Entropy) against the predictor variables (Sex or Sex+Snout-Vent Length (LHC)) that will be compared in an ANOVA (or in a linear model, but not doing that here). • The lambda argument is the vector values to check as potential exponential powers to raise the variable to in a transformation. By default, the function uses (-2, 2) with 0.1 interval steps. Functionally, this is the x variable in the plot output.
plotit = is a logical argument, and when it is set to TRUE, it will output a plot of the 95% confidence log-likelihood curve. Setting it to FALSE prints all the the x vakues and y values separately.
interp = is a logical argument controlling if the spline interpolation is used or not. The spline interpolation fits low-degree polynomials to small subsets of the values, instead of fitting a single, high-degree polynomial to all of the values at once. You should use TRUE if plotting data with a lambda less than 100.
eps = defaults to 1/50 (0.02); this is the tolerance for lambda to be 0. (I used the default.)
xlab = defaults to \(\lambda\) (I used the default).
ylab = defaults to “log-likelihood” (I used the default).
Below, you will see where I conducted these transformations on both the one-way model (Entropy ~ Sex) and the two-way model (Entropy ~ Sex+Snout-vent length); however, after running the ANOVAs on these acoustic variables, I found the authors did not use the comparisons of the acoustic variables to the different sexes corrected for SVL to generate their F-values. For this reason, I commented out those boxcox functions where I calculated how to transform the entropy variables for the two-way model below but you can see below how they would have worked and how I would have added those transformations to the working dataframe (df).

require(MASS) # reported used for normalizing data for ANOVA
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
# boxcox of agg entropy vs sex.
boxcox(df$`Agg Entropy` ~ df$Sex, lambda = seq(-2, 2, 1/10), plotit = TRUE, # "object" 
       interp = TRUE, eps = 1/50, xlab = expression(lambda), #interp TRUE gives slightly lower lambda (like 0.9 vs 0.8 with FALSE)
       ylab = "log-Likelihood")

# transformation for Aggregated Entropy Vs Sex
df <- df %>% mutate(AggE_t1 = (`Agg Entropy`)^(-9/10)) # adding column of transformed AggEntropy

# boxcox of agg entropy vs Sex corrected for snout-vent length
#boxcox(df$`Agg Entropy` ~ df$Sex + df$LHC, lambda = seq(-2, 2, 1/10), plotit = TRUE, # object
#       interp = TRUE, eps = 1/50, xlab = expression(lambda), #"interp"
#       ylab = "log-Likelihood")
# transformation for Aggregated Entropy Vs Sex + SVL
#df <- df %>% mutate(AggE_t2 = (`Agg Entropy`)^(-7/10)) # adding column of transformed AggEntropy

# boxcox of Avg entropy vs Sex
boxcox(df$`Avg Entropy` ~ df$Sex, lambda = seq(-2, 2, 1/10), plotit = TRUE, # "object" 
       interp = TRUE, eps = 1/50, xlab = expression(lambda), #interp TRUE gives slightly lower lambda (like 0.9 vs 0.8 with FALSE)
       ylab = "log-Likelihood")

# transformation for Aggregated Entropy Vs Sex
df <- df %>% mutate(AvgE_t1 = (`Agg Entropy`)^(-1/10)) # adding column of transformed AvgEntropy

Acoustic Variables (response variable) vs Sex (predictor variable)

In the methods, the authors reported using {car} package for Anova function (though they do not report the type). To replicate this, I started by working with the two entropy variables I found how to best transform for them to meet normality criteria of ANOVA. I have commented out the work with these variables that produced incorrect F-values, relative to that reported in Table 1. Such work includes running a one-way and two-way Anova model with transformed Agg Entropy against the sexes and against the sexes corrected for snout-vent length. I did the same with Avg Entropy. I also determined what type (I, II, or III) of ANOVA the authors used below, as you can see for the Agg Entropy and Avg Entropy. For these two variables, since the F-values and Sums of Squares of the untransformed Agg Entropy vs Sex for three types of ANOVA are the same regardless of the type, meaning the variables are balanced and factors are orthogonal, I proceeded with the remaining acoustic variables by just running the aov() function. This seemed fruitful as for each variable, I successfully replicated the exact F-value and level of significance reported in Table 1 by the authors.
Below, for each acoustic variable included in Table 1, I do the following: 1. I make an ANOVA model (using the aov() function) of the acoustic variable to see if it varies between the three sexes (female (F), pregnant male (MP), non-pregnant male (NMP). Sex is the predictor variable, and the acoustic variable is the response variable. Moreover, for Table 1, the authors do NOT correct by snout-vent length (i.e., Sex+SVL) as conducting this correction in the model gives a different F-value than what the authors report in table 1.:
aov(data = df, [acoustic variable] ~ Sex) 2. I then print the full summary of the original ANOVA model using the summary.aov()
3. then I pull out the F-value from the model summary, filtering for the variable Sex since I am comparing the acoustic variable between the three sexes 4. I print the omnibus F-value so show that it has been isolated.

# Snout-Vent Length (mm) between different Sexes.
SVL_sex <- aov(data = df, LHC ~ Sex) # SVL = LHC
summary.aov(SVL_sex) # print the model summary 
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Sex          2  8.312   4.156   6.583 0.00469 **
## Residuals   27 17.046   0.631                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
# pulling out the F-statistic (nonparametric)
SVL_sex.F <- SVL_sex %>% # pull out the F-value
  generics::tidy() %>% # a nice table
  filter(term == "Sex") # since comparing between sexes
SVL_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 6.582981
# Mass (g) between different Sexes.
mass_sex <- aov(data = df, Peso ~ Sex) # Peso = Mass/weight
summary.aov(mass_sex) # print the model summary 
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Sex          2 0.3655 0.18275   25.43 6.17e-07 ***
## Residuals   27 0.1940 0.00719                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
# pulling out the F-statistic (nonparametric)
mass_sex.F <- mass_sex %>% # pull out the F-value
  generics::tidy() %>% # a nice table
  filter(term == "Sex") # since comparing between sexes
mass_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1!
## [1] 25.43115
# Note Duration (ND, s) between different Sexes. the article reports this is in ms in Table 1, but, based on the data set, they are in seconds. 
ND_sex <- aov(data = df, ND ~ Sex) # model
summary.aov(ND_sex) # print the model summary 
##             Df   Sum Sq  Mean Sq F value Pr(>F)  
## Sex          2 0.007553 0.003776    4.97 0.0142 *
## Residuals   28 0.021276 0.000760                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# pulling out the F-statistic (nonparametric)
ND_sex.F <- ND_sex %>% # pull out the F-value
  generics::tidy() %>% # a nice table
  filter(term == "Sex") # since comparing between sexes
ND_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 4.969906
# Call Duration (CD, s) between different Sexes.
CD_sex <- aov(data = df, CD ~ Sex) # model
summary.aov(CD_sex) # print the model summary 
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Sex          2 0.4983 0.24917   8.122 0.00191 **
## Residuals   25 0.7669 0.03068                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 3 observations deleted due to missingness
# pulling out the F-statistic (nonparametric)
CD_sex.F <- CD_sex %>% # pull out the F-value
  generics::tidy() %>% # a nice table
  filter(term == "Sex") # since comparing between sexes
CD_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 8.122413
# Dominant Frequency (DF, Hz) between different Sexes.
DF_sex <- aov(data = df, DF ~ Sex) # model
summary.aov(DF_sex) # print the model summary 
##             Df  Sum Sq Mean Sq F value Pr(>F)  
## Sex          2  561049  280524   3.998 0.0297 *
## Residuals   28 1964686   70167                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# pulling out the F-statistic (nonparametric)
DF_sex.F <- DF_sex %>% # pull out the F-value
  generics::tidy() %>% # a nice table
  filter(term == "Sex") # since comparing between sexes
DF_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 3.997933
# Aggregated Entropy between different Sexes.
AggE_sex <- aov(data = df, `Agg Entropy` ~ Sex) # untransformed, also pull out F-value
summary.aov(AggE_sex) # print the model summary 
##             Df Sum Sq Mean Sq F value Pr(>F)
## Sex          2  0.445  0.2223   0.757  0.479
## Residuals   28  8.226  0.2938
# pulling out the F-statistic (nonparametric)
AggE_sex.F <- AggE_sex %>% 
  generics::tidy() %>%
  filter(term == "Sex") # since comparing between sexes
AggE_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of Agg Entropy alone (full table will also show the p-value)
## [1] 0.7566687
# to show type 2 and 3 print the same F-value 
# type 2 ANOVA 
AggE_sex.a2 <- Anova(AggE_sex, type = "II")
# print the results
AggE_sex.a2  # the F value here IS EXACTLY WHAT THEY GOT IN TABLE 1????? so apparently, despite this variable not fulfilling normality criteria for linear models. This is very confusing to me. # type 3 ANOVA 
## Anova Table (Type II tests)
## 
## Response: Agg Entropy
##           Sum Sq Df F value Pr(>F)
## Sex       0.4446  2  0.7567 0.4786
## Residuals 8.2265 28
AggE_sex.a3 <- Anova(AggE_sex, type = "III")
# print the results
AggE_sex.a3
## Anova Table (Type III tests)
## 
## Response: Agg Entropy
##             Sum Sq Df  F value    Pr(>F)    
## (Intercept) 47.286  1 160.9427 3.967e-13 ***
## Sex          0.445  2   0.7567    0.4786    
## Residuals    8.226 28                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# All F-values and Sums of squares of the untransformed Agg Entropy vs Sex for three types of ANOVA, meaning  the variables are balanced and factors are orthogonal, so can trust the default type 1. 

# AggE vs Sex transformed (t1) vs sex; did this because the authors report doing boxcox-informed transformations, but gives a different F-value than what they reported in Table 1, so have commented them out. 
# AggE_sex <- aov(data = df, AggE_t1 ~ Sex)
# AggE_sex.a <- Anova(AggE_sex, type = c(2))
# AggE_sex.a 

# Average Entropy between different Sexes. Because of the above, I did not use the boxcox transformed Avg Entropy, and it works out that the F-value calculated below is the exact same as that reported in Table 1.
AvgE_sex <- aov(data = df, `Avg Entropy` ~ Sex) # untransformed, also pull out F-value
summary.aov(AvgE_sex) #printing the summary
##             Df Sum Sq Mean Sq F value Pr(>F)
## Sex          2  0.024 0.01193   0.097  0.908
## Residuals   28  3.438 0.12277
# to show type 2 and 3 print the same F-value 
# type 2 ANOVA 
AvgE_sex.a2 <- Anova(AvgE_sex, type = "II")
# print the results
AvgE_sex.a2 # same as the original and same as what the authors reported.
## Anova Table (Type II tests)
## 
## Response: Avg Entropy
##           Sum Sq Df F value Pr(>F)
## Sex       0.0239  2  0.0972 0.9077
## Residuals 3.4375 28
AvgE_sex.a3 <- Anova(AvgE_sex, type = "III")
# print the results
AvgE_sex.a3 # same as the original and same as what the authors reported.
## Anova Table (Type III tests)
## 
## Response: Avg Entropy
##             Sum Sq Df  F value Pr(>F)    
## (Intercept) 36.485  1 297.1863 <2e-16 ***
## Sex          0.024  2   0.0972 0.9077    
## Residuals    3.438 28                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# All F-values and Sums of squares of the untransformed Avg Entropy vs Sex for three types of ANOVA, meaning  the variables are balanced and factors are orthogonal.
# pulling out the F-statistic (nonparametric) from the type 1 ANOVA model
AvgE_sex.F <- AvgE_sex %>% 
  generics::tidy() %>% # a nice table
  filter(term == "Sex") # since comparing between sexes
AvgE_sex.F$statistic #AvgE F-value. matches what's reported in Table 1
## [1] 0.09718057
# Call Rate (calls/min) between the Sexes.
CRR_sex <- aov(data = df, CRR ~ Sex)
summary.aov(CRR_sex) #printing the model summary
##             Df Sum Sq Mean Sq F value Pr(>F)
## Sex          2  1.032  0.5160   1.819  0.181
## Residuals   28  7.943  0.2837
# pulling out the F-statistic (nonparametric)
CRR_sex.F <- CRR_sex %>%  # pull out the F-value
  generics::tidy() %>% # a nice table
  filter(term == "Sex") # since comparing between sexes
CRR_sex.F$statistic #Call Rate F-value, matches the F-value reported in Table 1
## [1] 1.818948
# Sound Pressure Level (SPL) between the Sexes.
SPL_sex <- aov(data = df, SPL ~ Sex) 
summary.aov(SPL_sex) #printing the summary
##             Df Sum Sq Mean Sq F value Pr(>F)
## Sex          2   45.2   22.59   1.374  0.274
## Residuals   22  361.7   16.44               
## 6 observations deleted due to missingness
# pulling out the F-statistic (nonparametric)
SPL_sex.F <- SPL_sex %>% # pull out the F-value
  generics::tidy() %>% # a nice table
  filter(term == "Sex") # since comparing between sexes
SPL_sex.F$statistic # SPL F-value
## [1] 1.373854
# dataframe of all the omnibus F-values from the above ANOVAs plus two empty columns so it will align with the Acoustics table below
F.stat <- data.frame(sex = " ", #will correspond with sex
                     n = " ", # will correspond with sex pop size
                     svl.F = SVL_sex.F$statistic, # snout-vent length
                     mass.F = mass_sex.F$statistic, # mass
                     nd.F = ND_sex.F$statistic, # note duration
                     cd.F = CD_sex.F$statistic, # call duration
                     df.F = DF_sex.F$statistic, # dominant frequency
                     aggE.F = AggE_sex.F$statistic, # aggregated entropy
                     avgE.F = AvgE_sex.F$statistic, # avg entropy
                     crr.F = CRR_sex.F$statistic, # call rep rate
                     spl.F = SPL_sex.F$statistic) # sound pressure level
F.stat
##   sex n    svl.F   mass.F     nd.F     cd.F     df.F    aggE.F     avgE.F
## 1       6.582981 25.43115 4.969906 8.122413 3.997933 0.7566687 0.09718057
##      crr.F    spl.F
## 1 1.818948 1.373854
detach(package:MASS)

Mean, SD, and F statistic of weight, snout-vent length, and acoustic variables by Sex

Here, I replicate the Mean and Standard Deviations reported for each acoustic variable for each sex in Table 1. I do this by

#for making nice tables and doing data table manipulations
require(data.table)
# making a table of the calculated mean and standard deviation for each acoustic variable reported in Table 1
bySex <- group_by(df, Sex)
Acoustics <- bySex %>% 
  summarise(n_cases = n(), # sample size in group
            mean_SVL = mean(LHC, na.rm = TRUE), # mean SVL
            sd_SVL = sd(LHC, na.rm = TRUE), # standard deviation SVL
            mean_mass = mean(Peso, na.rm = TRUE), # mean mass (g)
            sd_mass = sd(Peso, na.rm = TRUE), # sd mass
            mean_ND = mean(ND, na.rm = TRUE), # mean note duration (s)
            sd_ND = sd(ND, na.rm = TRUE), # sd note duration
            mean_CD = mean(CD, na.rm = TRUE), # mean call duration (s)
            sd_CD =  sd(CD, na.rm = TRUE),
            mean_DF = mean(DF, na.rm = TRUE), # mean dominant freq (Hz)
            sd_DF = sd(DF, na.rm = TRUE),
            mean_AggE = mean(`Agg Entropy`, na.rm = TRUE), # mean Aggregated Entropy
            sd_AggE = sd(`Agg Entropy`, na.rm = TRUE),
            mean_AvgE = mean(`Avg Entropy`, na.rm = TRUE), # mean Average Entropy
            sd_AvgE = sd(`Avg Entropy`, na.rm = TRUE),
            mean_CRR = mean(CRR, na.rm = TRUE), # mean Call/min 
            sd_CRR = sd(CRR, na.rm = TRUE),
            mean_SPL = mean(SPL, na.rm = TRUE), # mean sound pressure level  (dB)
            sd_SPL = sd(SPL, na.rm = TRUE)
            )
   
# Reorganizing the summarized dataframe created above to combine the mean and sd into the same column. 
Acoustics <- Acoustics %>% # changing labels back to their original state for the table here.
  mutate(Sex = case_when(Sex == "F" ~ "Females", 
                         Sex == "NPM" ~ "Non-pregnant males",
                         Sex == "PM" ~ "Pregnant males")) %>% 
  unite(mean_SVL, sd_SVL, col = "snout_vent_length", sep = " ± ") %>% 
  unite(mean_mass, sd_mass, col = "mass", sep = " ± ") %>% 
  unite(mean_ND, sd_ND, col = "note_duration", sep =" ± ") %>% 
  unite(mean_CD, sd_CD, col = "call_duration", sep = " ± ") %>% 
  unite(mean_DF, sd_DF, col = "dominant_frequencey", sep =" ± ") %>% 
  unite(mean_AggE, sd_AggE, col = "aggregate_entropy", sep = " ± ") %>% 
  unite(mean_AvgE, sd_AvgE, col = "average_entropy", sep = " ± ") %>% 
  unite(mean_CRR, sd_CRR, col = "call_repitition_rate", sep = " ± ") %>% 
  unite(mean_SPL, sd_SPL, col = "sound_pressure_level", sep = " ± ") %>% 
  transpose(keep.names = "Acoustic Variable") # flipping the orientation of the table so Sex will be at the top
# adding the F.stat table to the Acoustic table
F.stat <- F.stat %>% transpose(keep.names = "Acoustic_Variable")# flipping orientation
F.stat # printing to ensure it worked
##    Acoustic_Variable                 V1
## 1                sex                   
## 2                  n                   
## 3              svl.F   6.58298100665384
## 4             mass.F   25.4311542105701
## 5               nd.F   4.96990639070826
## 6               cd.F   8.12241272701847
## 7               df.F   3.99793257425526
## 8             aggE.F  0.756668690621354
## 9             avgE.F 0.0971805730424986
## 10             crr.F   1.81894792111049
## 11             spl.F   1.37385423279138
Acoustics <- mutate(Acoustics, F_value = F.stat$V1) %>% # adding the F-values to the Acoustics table
  rename("F" = V1, "NPM" = V2, "PM" = V3) # renaming the Acoustics data table columns 
Acoustics %>% # printing the final data table, using kable to make it look nice :)
  kable(align = 'l', booktabs = TRUE) %>% # left aligned data
  kable_styling(font_size = 11, full_width = TRUE) # font-size
Acoustic Variable F NPM PM F_value
Sex Females Non-pregnant males Pregnant males
n_cases 9 11 11
snout_vent_length 23.2222222222222 ± 0.974394398816231 21.99 ± 0.768042244208538 22.1727272727273 ± 0.643569590783948 6.58298100665384
mass 1.06222222222222 ± 0.101707642015949 0.79 ± 0.0673300329224139 0.964545454545455 ± 0.0839480358750146 25.4311542105701
note_duration 0.191305666333333 ± 0.0325399835957568 0.152636938090909 ± 0.0256564327093529 0.165423762818182 ± 0.0249443852582658 4.96990639070826
call_duration 1.87322247025 ± 0.201837134488656 1.5454524386 ± 0.165680686897357 1.6341607955 ± 0.161488820849156 8.12241272701847
dominant_frequencey 3390.71382922222 ± 245.998679493103 3711.98892163636 ± 228.989746357455 3651.45401072727 ± 309.224874457775 3.99793257425526
aggregate_entropy 2.292149032 ± 0.536556919319484 2.01411166836364 ± 0.469465368911618 2.04578614972727 ± 0.609866842967914 0.756668690621354
average_entropy 2.01343139866667 ± 0.280593615650822 1.95511875736364 ± 0.397587255429383 1.94988657527273 ± 0.350271792331423 0.0971805730424986
call_repitition_rate 1.71111111111111 ± 0.4371625682868 1.45454545454545 ± 0.522232967867094 1.25454545454545 ± 0.607229176445988 1.81894792111049
sound_pressure_level 64.38023576625 ± 3.7557077162866 61.5763027144444 ± 4.05125858350565 64.37328545625 ± 4.33670962047346 1.37385423279138
detach(package:data.table)

Need to round the output of functions.

Figure 3: (a) Note Duration, (b) Call Duration, (c) Dominant Frequency for males and females of Rhinoderma darwinii.

Inferential statistical analysis

Boxplots of Acoustic Variables across Different Sexes with Post-hoc Tukey significance

Whiskers represent the 90th and 10th percentiles.

require(cowplot)
# A Boxplot of Note Duration
fig3_A <- ggplot(data = df, aes(x = Sex, y = ND)) + # made the object
  geom_boxplot(na.rm = TRUE) + # boxplot
  labs(x = "Sex", y = "Note Duration (s)", title = "A") + # axis labels and title with figure label
  scale_y_continuous(breaks = c(0.15, 0.20, 0.25, 0.30), # so axis matches that in the article
                     position = 'left') +
  theme_classic()
# B Boxplot of Call Duration
fig3_B <- ggplot(data = df, aes(x = Sex, y = CD)) + 
  geom_boxplot(na.rm = TRUE) + 
  labs(x = "Sex", y = "Call Duration (s)", title = "B") +  # scaling didn't work here. RIP
  theme_classic()
# C Boxplot of Dominant Frequency 
fig3_C <- ggplot(data = df, aes(x = Sex, y = DF)) + #define object
  geom_boxplot(na.rm = TRUE) + # boxplot
  labs(x = "Sex", y = "Dominant Frequency (Hz)", title = "C") + # axis labels and title with figure label, couldn't get the y-axis label to match here either.
  theme_classic()
# organize the output of figures using cowplot
plot_grid(fig3_A, fig3_B, fig3_C, label_size = 8, nrow = 1)

detach(package:cowplot)

Figure 4: Linear Discrimination functions (LD1 and LD2) representing distinctiveness in Rhinoderma darwinii advertisement calls.

visualization filter data for 4 notes per call (NC == 4, n = 22). drop_na() using MASS::lda() Linear Discriminant Analysis LD1 = f and npm, LD2 = f and pm

groups + x1 + x2

require(MASS)
df <- df %>%
    drop_na(DF, ND, CRR, `Agg Entropy`, SPL)
Sex1 <- df %>%
    dplyr::filter(Sex != "PM")  # filtering for only females and nonpregnant males
Sex2 <- df %>%
    dplyr::filter(Sex != "NPM")  # filtering for only females and pregnant males
lda1 <- MASS::lda(Sex ~ DF + ND + CRR + `Agg Entropy` + SPL, data = Sex1)
LD1 <- lda1$scaling
# normalize : subtract mean, divide by SD,
detach(package:MASS)

Summary/Discussion

Narrative section that overviews how successful were you at replicating the analyses and visualizations in the study.
1. What problems did you encounter? Some of the data described in the methods isn’t included in the data frame. Transformations (how, for what analyses, and which variables) not inclduded in the methods 2. Why might you have encountered those problems?
3. What details were lacking from the original study’s methods that might have hampered your ability to replicate the authors’ results?

References

• Sandmeier, F. (2016). Rhinoderma darwinii [Encyclopedia]. AmphibiaWeb; University of California, Berkeley, CA, USA. https://amphibiaweb.org/species/4322

• Serrano, J. M., Penna, M., Valenzuela-Sánchez, A., Mendez, M. A., & Azat, C. (2020). Monomorphic call structure and dimorphic vocal phenology in a sex-role reversed frog. Behavioral Ecology and Sociobiology, 74(10), 127. https://doi.org/10.1007/s00265-020-02903-3